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Abstract 



A virtual multiple-input multiple-output (MIMO) wireless system using the receiver-side cooperation 

> 

with the compress-and-forward (CF) protocol, is an alternative to a point-to-point MIMO system, when a 

o , 

0^ ! single receiver is not equipped with multiple antennas. It is evident that the practicality of CF cooperation 

will be greatly enhanced if an efficient source coding technique can be used at the relay. It is even 
more desirable that CF cooperation should not be unduly sensitive to carrier frequency offsets (CFOs). 
This paper presents a practical study of these two issues. Firstly, codebook designs of the Voronoi 
vector quantization (VQ) and the tree-structure vector quantization (TSVQ) to enable CF cooperation 

X' 

at the relay are described. A comparison in terms of the codebook design and encoding complexity is 
analyzed. It is shown that the TSVQ is much simpler to design and operate, and can achieve a favorable 
performance-complexity tradeoff. Furthermore, this paper demonstrates that CFO can lead to significant 
performance degradation for the virtual MIMO system. To overcome this, it is proposed to maintain 
clock synchronization and jointly estimate the CFO between the relay and the destination. This approach 
is shown to provide a significant performance improvement. 

Index Terms 

Virtual MIMO system, CF cooperation, source coding technique, codebook design, effects of carrier 
frequency offsets. 
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I. Introduction 

Multiple-input multiple-output (MIMO) systems have recently emerged as one of the most 
significant wireless techniques, as they can greatly improve spectral efficiency, channel capacity 
and link reliability of wireless communications [1]. These benefits have encouraged extensive 
research on a virtual MIMO system where the transmitter has multiple antennas and each of 
the receivers has a single antenna [2] [3]. When the transmitter does not have perfect channel 
state information (CSI) for the wireless link to each receiver, which is a common scenario in 
practical situations, single- antenna receivers can work together to form a virtual antenna array 
and reap some performance benefits of MIMO systems [4] [5]. The idea of receiver-side local 
cooperation is attractive for wireless networks since a wireless receiver may not be able to use 
multiple antennas due to size and cost limitations. For example, suppose a customer carries two 
mobile terminals that are a single- antenna 3G (or 4G) enabled user device and a simple relay 
device. Since the distance between the two devices is general much shorter than that from the 
base station, the two closely located devices could perform cooperation through short-range Wi- 
Fi, Bluetooth, or Ultra- Wideband communications. With such cooperation, the customer could 
expect traditional 2x2 MIMO benefits as if the two antennas belonged to the intended single- 
antenna user device. 

Motivated by the above practical scenario, we consider such a cooperative virtual-MIMO 
system here, with one remote multi-antenna transmitter sending information to several closely 
spaced single- antenna receivers. Many of the techniques developed for MIMO systems can be 
extended to be used in this virtual-MIMO system. For example, we also implement the bit- 
interleaved coded modulation (BICM) technique [6], which introduces a spatial and temporal bit 
interleaver into the transmitter, to provide forward error correction (FEC) and improve system 
performance. But unlike point-to-point MIMO systems, we need to perform cooperation among 
the receivers. As for the cooperation protocol, since the relays (i.e. the assisting receivers) 
are located close by the destination receiver in our scenario, compared to amplify-and-forward 
(AF) [7] and decode-and-forward (DF), the compress-and-forward (CF) relay protocol provides 
superior performance [8] and therefore serves as the best candidate for this system. Most previous 
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work on CF cooperation has focused on the classical three terminal relay channel, such as [8]- 
[10]. An extension to a virtual-MIMO system is introduced in [2], and the theoretical achievable 
cooperative capacity is analysed. Our recent work [3] and [1 1] considers a virtual-MIMO system 
with CF cooperation. But reference [11] focuses on the system performance assessment in terms 
of the system throughput and error probabilities. Paper [3] designs an adaptive modulation and 
cooperation scheme when the minimum mean square error (MMSE) detection is used at the 
destination. In both [3] and [11], only an optimal vector quantization (VQ), i.e. the Voronoi VQ, 
is considered. To reduce the complexity and enhance the practicality of CF cooperation, this 
paper will propose an alternative VQ, the tree- structure VQ. 

For conventional MIMO systems, all the antennas at the transmitter are fed with the same 
clock, and the same holds for the receiver side. A major distinction of the virtual-MIMO system 
compared to the MIMO case is that cooperating antennas are running with different oscillator 
frequencies. Carrier frequency offset (CFO), which is caused by oscillator mismatch between the 
transmitter and the receiver, can be estimated and then compensated at the receiver in MIMO 
systems. The MIMO system performance will therefore not be severely degraded by CFO. 
However, for the virtual-MIMO system, receiver-side cooperating antennas need to estimate and 
compensate their CFOs independently. Their different residual CFOs will result in inter-block 
interference and distort the correlation properties of the signals received at the destination, so that 
the system performance maybe impaired significantly. Estimation of CFO in a MIMO system 
has been investigated in [12]— [14] with using different estimation algorithms. When orthogonal 
frequency division multiplexing (OFDM) is considered, [15] has addressed the effect of CFO 
on channel estimation performance, and pilot sequences have been designed in [16] and [17] for 
joint CFO and channel estimation. The estimation methods for CFO in [12]— [17] are only for 
MIMO systems. In cooperative systems, the use of equalization has been proposed to mitigate 
effects from CFOs, such as in [18] and [19], but only for a single-antenna transmitter. Our paper 
focuses on the effects of CFO in the virtual-MIMO system. To the best of our knowledge, it is 
the first study of the CFO effects in the cooperative system with a multi-antenna transmitter. 

The main contributions of this paper are twofold. Firstly, we present a practical virtual-MIMO 
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system that implements CF cooperation with a standard source coding technique at the relay. 
To perform source coding, we consider two codebook design algorithms, Voronoi VQ and tree- 
structure vector quantization (TSVQ). To the best of our knowledge, it is the first time that 
TSVQ is applied to digital modulation signals. Their codebook design complexities and encoding 
complexities are investigated. Simulation results show that the TSVQ approach we designed is 
much simpler for encoding and more computationally efficient than the baseline Voronoi VQ. 
Moreover, for practical considerations, this paper studies the effects of CFO, and demonstrates 
that CFO can lead to severe performance degradation for the virtual MIMO system. To overcome 
these effects, a clock synchronization and joint CFO estimation scheme is proposed, to exploit the 
benefits of MIMO CFO estimation. Simulation results show that the proposed scheme provides 
a significant performance advantage. 

The paper is organized as follows: Section II specifies the system model of the cooperative 
virtual-MIMO system. Codebook design methods and the corresponding complexity analyses 
are investigated in Section III. The effects of CFO and how they are overcome are illustrated in 
Section IV. Section V shows the simulation results and Section VI concludes the paper. 

II. Virtual-MIMO System with CF Cooperation 

A. Channel Model 

Consider a cooperative virtual-MIMO network with one remote iV r antenna transmitter sending 
information to N r collocated single-antenna receivers, as shown in Fig. 1. Since our research 
focuses on investigating a practical implementation of the CF cooperation for virtual-MIMO 
systems, we start with a simple configuration with N t — N r — 2. At the transmitter, the information 
bits are encoded through a rate-i4 linear binary convolutional encoder. We implement the BICM 
technique to provide FEC. Thus the coded bits are interleaved through a random bit interleaver 
(int.). In each steam after the demultiplexer (or demux), groups of m bits are mapped onto 
complex data symbols via Gray-coded 2 m =M-ary PSK or QAM modulation. Note that the 
system studied here is not limited to BICM, other FEC coding schemes, such as Turbo coding 
or LDPC coding, could also be employed according to different application requirements. 
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Since the transmitter is some distance from the receiver group, a block fading channel model 
with N Rayleigh fading blocks is assumed here: each block having length L symbol periods. 
When we consider a single time instance I for the nth channel, the channel model is given by: 



Vrnl 
Udnl 



--H n x n i+w nh withf/ n : 



hi n h 2n 



(1) 



where H n denotes the nth block fading channel matrix, where each hi n (i € [1, ...,4]) is inde- 
pendent and identically distributed (i.i.d.). Without loss of generality, we assume normalized 
Rayleigh fading, i.e. E[|/ij n | 2 ] = 1. The complex scalars y rn i and y dn i axe the signals at the relay 
and destination receivers. We also define the vector x n i = [x\ n i, x 2n i} T and w n i = [wi n i,w 2 ni] T , 
where the noise samples Wi> n i ~ CAf(0,N ). The average transmitted power per symbol is 
E[|xj/ n /| 2 ] = E s /N t . We normalize the total power E s to unity, and the corresponding power 
per bit is E b = E s /(mR b ). We assume that perfect CSI is available at the receivers only. 

As the destination and the relay receivers are closely spaced, it is reasonable to expect that a 
high capacity communication link with high reliability can be formed between them. Hence, as 
also considered in [20] and [21], we assume the two receivers cooperate by way of an error-free 
conference link, as shown in Fig. 1. We consider one-shot conference cooperation [21] [22], 
which requires the destination to decode the signal sent over the conference link. In practice, the 
short-range conference link is realized via an orthogonal channel (i.e. a different frequency band) 
to the transmitter array. Compared with the long data channel H n , the orthogonal conference 
link is short-range allowing much higher rate transmission, permitting it to be reused many times 
over the coverage area of the long range link. 



B. Compress-and-Forward Cooperation 

The conference link enables cooperation, and CF could reasonably be used since it provides a 
higher rate when the relay is closer to the destination [8]. To perform CF cooperation, a standard 
source coding technique is employed for practical considerations. The reason why we do not 
employ the Wyner-Ziv (WZ) coding technique is that the virtual-MIMO system with multiple 
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antennas at the transmitter has the feature that y rni and y dn i are not highly correlated. The WZ 
coding technique therefore does not improve the performance significantly [23] [11], but instead 
introduces extra complexity. Since standard source coding is simpler and also performs well in 
practical scenarios, we choose to implement it at the relay. That is, the relay is equipped with 
a vector quantizer (VQ), as shown in Fig. 1. We define the quantization rate (i.e. source coding 
rate) as C, which is measured in bits per compressed sample. As the two receivers cooperate 
by way of an error-free conference link, the link capacity is equal to the source coding rate C. 
A compressed version of the signal y' rnl could be modeled by adding to y rn i an i.i.d. complex 
Gaussian noise [24], 

y'rnl = Vrnl + W cnh (2) 

where w cn i is the compression noise with variance a 2 n , i.e. w cn i ~ CAf(0,a 2 n ). A lower bound 
on the compression noise variance given in [25], which is computed by using Shannon's rate- 
distortion theory, could be extended to our virtual-MIMO system, i.e., 

_ 2 _ E[\y rnl \ 2 ] _ Np + \\h ln \ 2 + \\h 2n \ 2 

° cn 2 C - 1 2 C - 1 ' ( ' 

The general structure of the destination receiver is shown in Fig. 1. It receives signals y dn i 
from the transmitter, and observes signals from the relay via an intra-cluster receiver, written as 
y' rnl because of the error-free conference link. We denote the received signals at the destination 
as y n i = [y'mi VdniY ■ The destination requires knowledge of h\ n , h 2n , and a 2 n sent from the 
relay. Assuming w cn i is i.i.d. complex Gaussian, y' rnl could then be scaled so that y' rnl and y dn i 
experience the same power level of additive Gaussian noise, 

y n l = [y'rnl ydnlf = H n X ni + [w lnl W 2nl } T , (4) 



with H n = 



tonkin y/rj^h 2 n 



A N 
iV + a 2 



Here w\ n i ~ i.i.d. CA/"(0,A r ), r\ n is the degradation factor due to the compression noise, 
and y' rnl = y/rj^y' rrU . The scaled channel matrix H n will help the destination to mitigate the 
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effects of the compression noise. Next the destination performs joint maximum-likelihood (ML) 
demodulation of y nl , computing the log-likelihood ratio (LLR) for each coded bit. Finally the 
decoder accepts the deinterleaved LLRs of all coded bits and employs a soft-input Viterbi 
algorithm to decode the signals. Thus with help from the relay, the single-antenna destination 
receives signals from two transmit antennas. As shown in (3) -(5), with fixed SNR, a high source 
coding rate C will result in o\ n decreasing to 0, and then H n tends towards H n in value. A good 
quality compression scheme with a high value of C will allow the destination to use y' rnl for 
MIMO decoding, and enable the virtual-MIMO to achieve almost ideal MIMO performance. 

III. Vector Quantization Design at the Relay 

To perform CF cooperation, a standard source coding technique (i.e. VQ) is employed, shown 
in Fig. 1. The key tasks of the relay thus include constructing a good codebook, quantizing the 
received signals, and forwarding the compressed signals y' rnl to the destination. The codebook 
design techniques and the corresponding complexities will be analysed in this section. 

A. Codebook Design 

The codebook design at the relay is based on the number of codebook vectors which equals 
2 C and requires knowledge of the noise-free constellation. Note that, besides signal symbols, 
some control information such as the modulation type is also transmitted on control channels in 
practice. It is reasonable to expect the relay could construct the noise-free constellation of the 
received signals, i.e. the constellation of hi n Xi n i+h 2 nX2nh denoted by y c rnl . The codebook design 
which uses y c rnl as the training vectors, will be simple and efficient. In this paper, Voronoi VQ, and 
TSVQ [26] are employed at the relay. Voronoi VQ is considered first, as it has the advantage that 
the codebook is optimal in the sense of minimising average distortion. To design it, the Linde- 
Buzo-Gray (LBG) algorithm which is based on the iterative use of codebook modification, is 
used [10]. However, Voronoi VQ implies high computational and search complexity, especially 
for high-order modulations and large C, as will be shown in Section III-B. 

To reduce the complexity, we also consider TSVQ for high-order modulations. Unlike the 
Voronoi VQ which requires a global and exhaustive search, the selection of the codebook vector 
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in TSVQ can be divided into several stages and allows different design methods for each. 
Simulation results suggest that, a combination of two path high-order rotationally symmetric 
constellations is also rotationally symmetric. For example, as shown in Fig. 2 (a), the combination 
of two 16QAM constellations from two paths, y c rnl , is four-fold rotationally symmetric: the origin 
is the centre of rotation, and n/2 is the angle of rotation. That is, the vectors of y c rnl look the 
same after ir/2 rotations are applied. The first stage quantization of TSVQ could therefore come 
from the rotational classification, i.e. classifying y c rnl into some subsets based on the rotational 
symmetry, as illustrated in Fig. 2 (b). The sub-codebook designed for one subset would then be 
easily extended to the whole codebook, with the phase angles of the sub-codevectors changing by 
7r/2 every time. The final stage of the TSVQ could be determined by applying the LBG algorithm 
on the subsets to obtain optimal final-stage codebooks. If there exist more than two stages, the 
middle stage could be resolved by using the LBG algorithm, or by classification according to 
some scheme (e.g., magnitude and phase). The disadvantage of LBG for the middle stage is that it 
not only needs to store the entire training subsets corresponding to the sub-codebooks for further 
stage TSVQ design, but also implies a higher design complexity. For practical considerations, the 
classification method is a better choice to design the middle stage of TSVQ. Here we denote the 
quantization rates for the three stages of TSVQ as C 1 , C 2 and C 3 and we have C = C\ + C 2 + C 3 . 

Fig. 2 demonstrates a codebook design example where we suppose the two signals from 
the transmitters are 16QAM modulated. A set of y^ n i for one choice of channel coefficients 
is shown in Fig. 2 (a). The points could be firstly divided into four subsets (i.e. C\ — 2 
bits/sample) according to their rotational symmetry. For one subset, we sort the constellation 
points in ascending order by their magnitudes and phases, and divide the subset into two groups 
in ascending order, i.e. C 2 = 1 bit/sample. After the two stages of TSVQ, we have 8 groups, 
labeled (1) - (8), as shown in Fig. 2 (c). Then the LBG algorithm is employed twice for group 
No. (1) and (2) to obtain the sub-codebook for one quadrant. The final stage quantization for 
TSVQ is thus completed by phase rotations. Compared to the Voronoi VQ used in Fig. 2 (d), the 
TSVQ we implemented here is much simpler to design and can achieve similar performance, as 
will be shown in Section V. The illustrative example we present here uses C 2 = 1 bit/sample; 
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A higher value of C 2 could be considered to further decrease the computational complexity, or 
one can set C 2 = bits/sample for a high-accuracy quantization. 

Note that depending on the source coding rate C, there may not be enough codebook vectors 
to cover all noise-free constellation points. This is evident in Fig. 2 (d) for example, where one 
codebook vector lies among several adjacent modulation symbols. We rely then on the MIMO 
decoding at the destination to resolve ambiguities and correctly decode the data. 

B. Complexity Analysis 

As described above, both the Voronoi VQ and TSVQ use the LBG algorithm, but with a 
different size of training sequence. The codebook design complexity therefore comes from the 
computational complexity of LBG. The computational time for the LBG algorithm is [27]: 

T LBG = L2 c QT d + I S (2 C - 1)QT C , (6) 

where 2 C is the codebook size, I s is the number of iterations, and Q is the number of training 
vectors. T d and T c denote the computational time for one distortion value and comparing two 
distortion values, respectively. For the Voronoi VQ which implements the LBG algorithm on the 
whole noise-free constellation y c rnl , we have Q = M 2 = 2 2m . Since I s < Q/2 C [27], we obtain: 

rp rp ^ c\Amrp , c\Am 

J d, Voronoi — J- LBG S * + d + * ± c- \ >) 

For the TSVQ we implemented, the computational time is the summation of three stages, 



2d,TSVQ < 



)2m o2m /o2m 



-T d +— (— -m 



sgn(C 2 



2^m 2^ 3 1 



22(Ci+c 2 ) a 1 2 2 ( c i+ c 2) 2 C 3 



, (8) 



where sgn(C 2 ) denotes the signum function of C 2 . A justification for (8) is as follows: In 
accordance with the multistage TSVQ we designed in Section III-A, we get C\ = 2 bits/sample 
for the first stage of TSVQ. Then we firstly consider the case C 2 > 0. In one quadrant, there 
are 2 2m /4 vectors, the distortion values of which are to be computed and sorted to obtain 
several separate groups for the second stage quantization. Here the selection sort algorithm 
is considered, requiring 2^_(2^_ _ i)/2 comparison operations. For the final stage, the LBG 
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algorithm is implemented 2° 2 times for the 2 C ' 2 groups in one quadrant. In one group, there 
are Q = 2 2m /2^ Cl+c '^ training vectors and the codebook size is 2 Ci . So we obtain an upper 
bound of T djTS vQ as shown in (8). For the case C 2 = 0, it is obvious that the computational 
complexity comes from the LBG algorithm used in the final stage quantization where 2 2m /2 Cl 
training vectors are considered. 

For example, when we consider C\ — 2 bits/sample and C 2 = 1 bit/sample, we have 
T d ,voronoi = 0(2 Am (T d + T c )) and T dJ sv Q = 0{2 4m - 5 T d + 2 4m " 4 T c ), for a high-order modulation 
(e.g 16QAM) and a large value of C. That is, compared with the Voronoi VQ, TSVQ decreases 
the computational complexity for the codebook design significantly. 

As to the symbol encoding, TSVQ will also allow a faster codebook search. Specifically, the 
encoding algorithm for a Voronoi VQ can be viewed as an exhaustive search algorithm. For a 
codebook of size 2° ', the codevector selection for one symbol requires 2 C distortion evaluations 
and 2° — 1 comparisons. The required time to search the codebook for one symbol is shown as: 



For the TSVQ we designed, the search procedure includes two steps: finding out an appropriate 
group, and performing a full search on the group. Thus the search time of TSVQ is, 



Therefore, compared with the Voronoi VQ, the multistage TSVQ not only decreases the 
computational complexity for the codebook design, but also allows a faster codebook search 
for the encoding. Since the multistage TSVQ can also achieve a good performance, it is a better 
choice to enable CF cooperation in practice. 



With CF cooperation, the destination may expect traditional MIMO benefits in the virtual- 
MIMO system. However, due to the different oscillator frequencies at the source, relay and 
destination receivers, carrier frequency offsets (CFOs) occur [18] [28]. It will cause severe 



^s.Voronoi — 2 T d + (2 — 1)T C . 



(9) 



T s ,tsvq = (C 2 + 2 c *)T d + (C 2 + 2 C * - 1)T C . 



(10) 



IV. Effects of Carrier Frequency Offset 
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performance degradation, since CFOs between the transmitter-to-relay and the transmitter-to- 
destination channels can result in inter-block interference and distort the correlation properties of 
the signals at the final destination. This section thus focuses on the effects of CFO in the virtual- 
MIMO system. A clock synchronization and joint CFO estimation scheme is then proposed. 

A. CFO Estimation 

CFOs cause continuous phase rotations of the corresponding signals, and thus impair the 
detection performance. To alleviate this effect, the CFOs have to be estimated and then compen- 
sated at the receivers. The estimation of CFO in a MIMO system has been investigated in the 
literature [12]— [14]. The methods in [12] and [13] are pilot aided requiring training sequences, 
whereas [14] develops a blind CFO estimation technique. The CFO estimator in [12] is based 
on the measurement of the phase shift between consecutive channel estimation sequences. As it 
is simple and effective in practice, we firstly implement it at the relay and the destination as a 
baseline case. 

Since N t antennas at the transmitter are fed with the same clock, frequency offsets at the relay 
and destination are defined as f rn and f dn , respectively. We let [u n i,u n ^ + i)] denote the channel 
estimation sequence, which occupies two symbol periods as N t — 2, and u n \ is an N t xl vector 
transmitted through the channel matrix H n [12]. Channel estimation sequences are transmitted 
continuously on a dedicated pilot channel. The received signals at the relay and destination 
receivers corresponding to the channel estimation sequence u ni at time t n \ is then given by [12] 
[15], 

Vrnl 
Vdnl 

where w™i is the N t x 1 complex Gaussian noise vector, with the noise samples ~ CA/"(0, N ). 
The equation (11) represents the phase rotations of the received signals caused by CFOs. 
According to [12], we let the matrix P rn i be formed by collecting the received signals at the relay 
corresponding to [u n i,u n{ i +1) ], i.e. P rnl = [y^ nl , y"„ (J+1) ], and let P rn2 be defined as containing 
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e 3(2nf dn t n i) 



H n u n i + w u nl , 



(ID 



12 



that pertaining to the next transmission of the channel estimation sequence. Likewise, P dnl and 
P dn2 are defined for the destination. Then CFOs f rn and f dn can be estimated as, 



where arg[-] denotes the angle operation, t s denotes the symbol interval, and {•}* represents the 
Hermitian transpose. 

B. Clock Synchronization and Joint CFO Estimation Scheme 

The basic motivation for this proposed scheme is the fact that CFO causes severe performance 
degradation for the 2 x 2 virtual MIMO system. It is shown in [12] and [28] that a higher 
number of transmit and receive antennas, e.g. 4x4 and 8x8 MIMO systems, leads to significant 
improvement in the accuracy of the CFO estimate. However, for the simpler 2x2 MIMO or 2 x 1 
MISO systems, residual CFO will still impair the detection performance. Meanwhile, cooperative 
virtual-MIMO systems also suffer performance degradation as compared to conventional MIMO 
where all the antennas at the receiver are fed with the same clock. In virtual-MIMO systems, 
cooperating antennas which are running with different clocks require to estimate and compensate 
their CFOs independently, as shown in equations (12) and (13). Their residual CFOs, i.e. (f rn — f rn ) 
and (fdn — fdn), are different and independent. Since the operation at the relay does not cause 
extra phase rotations, the different residual CFOs will then distort the correlation properties of the 
received signals at the destination, so that the system performance is impaired. To mitigate this 
effect, the use of frequency-domain equalization has been proposed in cooperative systems [18] 
[19], but only for a single-antenna transmitter. For our virtual-MIMO system which has multiple 
antennas transmitting different data, we propose to maintain clock synchronization across the 
receivers, and then jointly estimate CFO between both terminals. 

The proposed scheme involves two steps. Clock synchronization is the first step for providing 
a common notion of time across the relay and the destination. State of the art synchronization 
algorithms are described in [29], assuming that the transmissions for delivering time information 

February 11, 2013 DRAFT 



f rn = e,Tg[P rn2 -P r l 1 ]/(27i-2t s ), 
f dn = a Tg[P dn2 -Pj nl ]/(27r-2t s ), 



(12) 



(13) 



13 

are line of sight. To estimate the frequency difference between the receivers, a two-way message 
exchange (which is a classical timing message signaling approach) could be implemented. 
Consider the destination as the reference node, so that the relay needs to synchronize with 
the destination. It requires timing messages to be exchanged several times to achieve a certain 
synchronization accuracy. The stable nature of the error-free conference link between the relay 
and the destination is helpful for this synchronization process. In practice, the conference link 
is short range, and it is reasonable to expect that in many cases the conference link is a reliable 
line-of-sight link with high bandwidth, which may support frequent timing message exchanges. 
Hence, on the stable short-range conference link, the relay and the destination could maintain 
clock synchronization. 

The clock synchronization approach studied here focuses specifically on frequency locking, 
as the effect of frequency offset is the main reason why clock offsets drift over time [30]. 
Compensating the frequency offset guarantees long-term reliability of synchronization, so that 
re- synchronization only need to be preformed for each H n . Without loss of generality, after the 
clock synchronization, frequency offsets at the relay and the destination are modelled by (as 
shown in Fig. 3), 

frn = fdn + A n , (14) 

where A n denotes the frequency synchronization error, which is determined by the synchro- 
nization algorithm, the delays in timing message delivery, and the number of observations of 
timing messages. Efficient algorithms relying on two-way message exchanges have been reported 
in [29]. For algorithms that do not compensate the frequency offsets, such as the timing-sync 
protocol for sensor networks (TPSN), synchronization has to be performed more frequently 
to maintain the required accuracy. The algorithm in [30] computes and corrects the frequency 
offsets, and thus serves as a good candidate for our system. It is reported that the frequency 
synchronization error decreases as the number of timing messages exchanged increases. 

The second step of the proposed scheme is to perform joint CFO estimation between the 
relay and the destination, to obtain the benefit of MIMO CFO estimation. Specifically, the relay 

February 11, 2013 DRAFT 



14 



computes J rn = P rn2 ■ P} nl and transmits it to the destination via the conference link. The 
destination receives J dn = Pdn2 • P dnl and estimates f dn based on J rn and J dn , 



f dn = arg[J rn + J dn }/ (2tt- 2t s ). 



(15) 



The estimated CFO is then shared with the relay, i.e. f rn = f dn , to help the relay compensate 
its CFO f rn before vector quantization. Here we compare f dn against the estimated CFO of 
the corresponding MIMO system which is denoted by f n . Given a specific channel condition, 
when the synchronization error A„ = 0, we have f dn computed using the joint CFO estimation 
scheme equals /„ in value, and therefore the benefits of MIMO CFO estimation are exploited. 
A brief proof is as follows: For the MIMO system, the received signals corresponding to u nl 
is given by = H n u rd e^ 27Tfntnl) + w u nl . According to [12], P nl is defined as [y^,.y" (m) ], and 
P n 2 is for next transmission of the channel estimation sequence. Then we get a square matrix 
J n = P n2 -Pni, so that f n can be estimated by, 



/ n = axg[tr(J n )]/(27r-2* fl ), 



where tr(- ) denotes the trace operation. If A n = 0, we have = [y^ nl y dn i\ T , so that, 



(16) 



tr(P n2 .P n t 1 )=tr 



V 



P-rn2 
Pdn2 



\ 



. [pt pt 



drill 



Prn2 ' P r nl + Pdn2 ' P dnl 



(17) 



/ 



Substituting (17) into (16), and comparing it with (15), we finally get f dn = f rn = f n . The 
joint CFO estimation scheme therefore exploits spatial diversity of the MIMO receiver (when 
A n = 0), and offers significant improvement compared to (13). The performance of the joint 
CFO estimation scheme is thus lower bounded by the perfect frequency- synchronized case. 
For the case A n ^ 0, at high SNR, f dn will tend toward f dn + A n /2, 



frn — fdn — fdn "I" A ra /2 — f rn A n /2. 



(18) 



That is because the estimated f dn from (15) is based on two CFO observations. When A n ^ 
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0, with SNR increasing, f dn will equal the average of f dn and f rn . Thus for both the relay 
and the destination, the magnitude of CFO mismatch is A n /2. In this case, the joint CFO 
estimation scheme may still take advantage of the cooperative estimation, but the benefit will be 
reduced as A n increases. Different values of A„ represent different degrees of synchronization 
across the relay and the destination. The distribution of A n is complicated, and is related to the 
synchronization algorithms, various delays in timing message delivery and the number of timing 
messages exchanged, as mentioned before. In this paper, we consider a fixed value of A n for 
all channel conditions and will drop its subscript n in simulations (in Section V), in order to 
demonstrate the effects of the value of A on performance. 

Using the two steps, we get the clock synchronization and joint CFO estimation scheme to 
counteract the performance degradations caused by CFOs. Note that the joint CFO estimation 
studied here is based on McKeown's algorithm in [12], but our approach can be applied to other 
estimation algorithms with suitable changes to the shared information to complete the joint CFO 
estimation operation. For example, if OFDM is used, the pilot symbols to be transmitted will be 
grouped into blocks, each block will be transformed by IDFT, and cyclic prefixes or some zeros 
will be added to the tail of each transformed block [15]. The pilot blocks could be designed 
(as discussed in [16] and [17]) and the CFO estimation algorithm will be selected accordingly, 
with the goal of providing a small estimation mean squared error. At the receiver side, using our 
proposed scheme, clock synchronization is performed first such that the CFOs at the relay and 
the destination are f dn + A and f dn , respectively. Then the relay and destination perform joint 
CFO estimation using the selected estimation algorithm (corresponding to the dedicated pilot 
design), and thus obtain the benefits of MIMO CFO estimation. Note that, using OFDM, CFO 
estimation and compensation are performed before the serial-to-parallel conversion [15] [16]. 

V. Numerical Results 

In this section, we present the error performance of our cooperative virtual-MIMO system 
(N t = N r = 2). At the transmitter, a binary convolutional code is assumed with the generator 
polynomials [133, 171] octa i (R b = l/2). Gray-labeled QPSK or 16QAM modulation are considered. 
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The channels between the transmitter and receivers are assumed to be i.i.d normalized block 
Rayleigh fading, with 10 6 fading blocks and each block has 196 consecutive symbol periods. 
The simulation results are obtained using the Monte Carlo method. We plot the bit error ratio 
(BER) or block error ratio (BLER) against the information bit SNR, i.e E b /N . 

A. BER Evaluation of VQ Design 

The BLER performance of the cooperative virtual-MIMO system with the Voronoi VQ under 
various quantization rates, is shown in Fig. 4. The BLERs are compared against the corresponding 
ideal MIMO system, and the non-cooperative MISO system. Fig. 4 shows that, with Voronoi 
VQ at the relay, C = 7 bits/sample for 16QAM and C = 4 bits/sample for QPSK modulation, 
will enable the system with CF cooperation to approach ideal MIMO performance. 

As mentioned above, to decrease the complexity, we propose to employ TSVQ to design the 
codebook at the relay. When C = 6 bits/sample, Fig. 5 (a) shows the BER results of the TSVQ 
cooperative system which is set up in accordance with the example for 16QAM mapping in 
Section III-A. Its BER is compared against the performance of the system with the Shannon 
coding bound. According to equation (2), the compression noise w cn i is considered for this kind of 
system, with the Shannon coding bound of the variance calculated via (3). As shown in this figure, 
the Voronoi VQ obtains performance which is quite close to the Shannon coding bound. Even 
though the TSVQ we designed here is suboptimal, it is much simpler to design and operate and 
can achieve error ratios comparable to the optimal but more complicated Voronoi VQ. For a given 
quantization rate of 6 bits/sample, the Voronoi VQ requires a O (65536 (T d + T c )) computations 
which is much larger than that of TSVQ requiring C(2048T d + 4096T C ) computations. 

Additionally, considering the quantization rates 5 bits/sample and 7 bits/sample, we compare 
the performance of the Voronoi VQ and TSVQ in Fig. 5 (b). It can be seen that the BER of 
TSVQ for C = 6 bits/sample in Fig. 5 (a) performs almost the same as the Voronoi VQ for 
C = 5 bits/sample in Fig. 5 (b). According to (9) and (10), that means the TSVQ with encoding 
complexity (9T d +8T c ) is able to achieve the performance of the Voronoi VQ with (32T d +31T c ) 
complexity. Also, the TSVQ for C = 7 bits/sample performs similarly to the Voronoi VQ for 
C = 6 bits/sample. That is, TSVQ could approach the performance of Voronoi VQ with roughly 4 
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times lower encoding complexity. TSVQ always requires a much lower computational complexity 
for the codebook design as well. Thus the TSVQ we implemented here is more efficient than the 
Voronoi VQ, and is a better choice for the CF cooperation to enable the virtual-MIMO system 
to achieve MIMO performance in practice. 

The above TSVQ results use C 2 = 1 bit/sample accorded to the illustrative example in Section 
III-A. But C 2 is not limited to that value: a higher C 2 could be considered to further decrease the 
computational complexity, or C 2 = bits/sample can be used for a high-accuracy quantization, 
since C 3 = C — C\ — C 2 . The comparison of various values of C 2 for the TSVQ cooperative 
system is shown in Fig. 6. We assume a total quantization rate C = 7 bits/sample for 16QAM 
mapping in this figure. The corresponding design and encoding complexities at a BER of 1CT 3 , 
are shown in TABLE I. Compared to the Voronoi VQ, the TSVQ with C 2 = bits/sample can 
reduce the codebook design complexity to one sixteenth, and decrease the encoding complexity 
to one quarter, for a E^/Nq performance penalty of only 1.3 dB. Then as C 2 increases, both the 
design and encoding complexities reduce, but the BER performance becomes worse. Since both 
the second and third stage of TSVQ contribute to the complexities according to (8) and (10), the 
complexity reduction is not proportional to the increase of C 2 . Neither does the performance loss 
scale linearly with C 2 (i.e. the E b /N Loss). To achieve a favourable performance-complexity 
tradeoff, C 2 = or 1 bit/sample is a good choice for this case. 

B. Effects of Carrier Frequency Offsets 

According to the 3GPP standards, the tolerance required for frequency accuracy is from ±0.1 
ppm (parts per million) to ±0.25 ppm [33]. It translates to a CFO in the range up to ±500 Hz at 
a carrier frequency of 2 GHz. For practical considerations, we assume the CFO at the destination 
follows a uniform distribution, i.e. f dn ~ U[-500, 500] Hz, and again f rn = f dn ± A. Moreover, 
a generic sub-frame structure defined in 3GPP LTE is adopted [34]: one sub-frame is made up 
of two 0.5 ms slots, each made of seven symbols, and thus the symbol interval t s = 71 A fxs. 
The frequency range that can be estimated is therefore ±3500 Hz [12], which can cover the 
maximum CFO assumed here. Given a specific H n , the relay and the destination perform clock 
synchronization, and then jointly estimate and compensate their CFOs for each sub-frame. 
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The BLER performance of the 2x2 virtual-MIMO system with or without clock synchro- 
nization for QPSK modulation is shown in Fig. 7. To focus on the effects of residual CFO, 
we firstly apply the Voronoi VQ with C= 4 bits/sample for cooperation. For the cooperative 
system without clock locking, CFOs cause a drastic performance degradation compared to the 
ideal MIMO case. In contrast, the proposed clock synchronization and joint CFO estimation 
scheme provides a significant performance advantage, which is lower bounded by the perfect 
clock locking case (i.e. A = Hz). Then a family of dash-dot curves shows the performance 
of various degrees of synchronization. For a large value of A, there exists an error floor, which 
is caused by the residual CFO which equals A/2 according to (18). When A > 100 Hz, joint 
CFO estimation is of little value, as it does not provide a performance advantage compared to 
the case without clock locking. For a target BLER of 1CT 2 , a synchronization error smaller than 
60Hz provides a very good performance close to the lower bound. 

A similar trend can be seen in Fig. 8 for 16QAM mapping: Maintaining clock synchronization 
and jointly estimating CFO at the relay and the destination could provide a significant BER 
improvement which is close to the ideal MIMO case. Comparing the results for 16QAM and 
QPSK, it is obvious that 16QAM needs a higher level of synchronization because of its higher- 
order constellation, but QPSK could tolerate a larger synchronization error. For 16QAM, A < 25 
Hz offers an acceptable performance, while A < 20 Hz could provide a very good performance 
for the virtual-MIMO system. Moreover, besides the Voronoi VQ with C= 7 bits/sample, this 
figure also applies the TSVQ we designed (Ci= 2, C 2 = 1, C 3 = 4 bits/sample) to enable CF 
cooperation for a specific case of the proposed scheme where A = 20 Hz is considered. The 
system performance with TSVQ follows a similar trend as the system with the Voronoi VQ. For 
a target BLER of 10 2 , TSVQ at A = 20 Hz performs almost the same as the Voronoi VQ at 
A = 25 Hz, but TSVQ has a much lower complexity. 

VI. Conclusions 

In this paper, a cooperative virtual-MIMO system using two transmit antennas that implements 
BICM transmission and CF cooperation among two receiving nodes was presented. We proved 
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that the CF cooperation using standard source coding at the relay could enable the virtual- 
MIMO system to achieve almost MIMO performance. Then two codebook design algorithms 
were presented, Voronoi VQ and TSVQ, based on knowledge of the noise-free constellation. 
A comparison in terms of the codebook design complexity and encoding complexity was also 
presented. We have shown that, compared to the Voronoi VQ, the TSVQ can reduce the codebook 
design complexity to less than one sixteenth, and decrease the encoding complexity to less than 
one quarter, for a performance penalty of only 1.3 - 1.6 dB. A higher middle-stage quantization 
rate C 2 could be considered to further decrease the complexities, or C 2 = bits/sample used 
for high-accuracy quantization. In practice, the TSVQ is a better choice for CF cooperation to 
achieve a favourable performance-complexity tradeoff in the virtual-MIMO system. 

Additionally, for practical considerations, we also investigated the effects of CFO, and demon- 
strated that CFO could lead to drastic performance degradation for the 2 x 2 virtual MIMO system. 
A scheme which maintains clock synchronization and jointly estimates CFO between the relay 
and the destination, is proposed to overcome the limitations of separate CFO estimation at the 
relay and destination. Simulation results showed that the proposed scheme provided a significant 
performance improvement. For a target BLER of 10 2 , a synchronization error smaller than 60 
Hz for QPSK and 20 Hz for 16QAM mapping, could offer good performance close to the case 
with perfect clock locking. 

This paper dealt with two practical issues for the virtual-MIMO system with CF cooperation. 
We designed the efficient TSVQ as source coding technique at the relay, and proposed the 
clock synchronization and joint CFO estimation scheme so that the cooperation is not unduly 
sensitive to CFOs. The TSVQ is not limited to 16QAM mapping, the principles of which 
could easily be applied to multiple modulation types and quantization rates. Also, the clock 
synchronization and joint CFO estimation scheme could employ other estimation algorithms, 
besides McKeown's method. By extending to a wide range of applications, the virtual-MIMO 
system is therefore particularly valuable and attractive to some realistic wireless communication 
systems. The extension to more cooperating terminals with more antennas is left as future work. 
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Fig. 1. System model of the cooperative virtual-MIMO system. (TX and RX stand for the transmitter and receiver.) 
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Fig. 2. Noise-free constellation of the received signals and the codebook designed at the relay with C= 6 bits/sample. (The 
rotational symmetry of the noise-free constellation is shown in (a); The conceptual diagram of the design process for TSVQ is 
in (b); The TSVQ codebook with Ci= 2 bits/sample and 62= 1 bit/sample is shown in (c); The Voronoi VQ codebook is in 
(d).) 
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Fig. 3. Carrier frequency offsets at the relay and the destination. 
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Fig. 4. BLER results of the cooperative virtual-MIMO system with Voronoi VQ at the relay and ML receiver at the destination. 
(Solid red curves correspond to 16QAM mapping, while dash-dotted black curves correspond to QPSK mapping.) 
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Fig. 5. BER performance of the cooperative virtual-MIMO system with TSVQ or Voronoi VQ at the relay. (Quantization rate 
6 bits/sample is considered in (a), while 5 and 7 bits/sample are considered in (b).) 
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Fig. 6. BER comparison between the Voronoi VQ and TSVQ in the virtual-MIMO system. (C = 7 bits/sample for the Voronoi 
VQ, and various C2 for TSVQ are considered.) 
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TABLE I 

Complexity and performance comparison between the Voronoi VQ and TSVQ. 
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Eb/Vo Loss 
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Fig. 7. BLER performance of the virtual-MIMO system with or without frequency synchronization for QPSK mapping. (Various 
degrees of synchronization and Voronoi VQ are considered.) 
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Fig. 8. BLER performance of the virtual-MIMO system with various degrees of frequency synchronization for 16QAM mapping. 
(Both Voronoi VQ and TSVQ are considered.) 
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